New LaTeX commands declared in this cell. Expand to view. $$ \newcommand\abs[1]{\left|#1\right|} \newcommand\dx[2]{\frac{\partial#1}{\partial#2}} \newcommand\ddx[2]{\frac{\partial^2#1}{\partial#2^2}} \newcommand\divv[1]{\nabla\cdot\vect{#1}} \newcommand\curl[1]{\nabla\times\vect{#1}} \newcommand\grad[1]{\nabla\vect{#1}} \newcommand{\ket}[1]{\left|#1\right\rangle} \newcommand{\bra}[1]{\left\langle#1\right|} \newcommand{\braket}[1]{\langle#1\rangle} \newcommand{\aint}[0]{\int_{-\infty}^{\infty}} \newcommand{\vect}[1]{\mathbf{#1}} \newcommand{\order}[1]{\mathcal{O}{#1}} $$
The following discussion assumes evenly spaced steps in time $\Delta t$. The Fourier series of a function is its decomposition into individual sines and cosines:
$$ y(t) = \sum_{j=1}^n y_j \sin(2 \pi f_j t + \phi_j) \\ \text{With } _j = \frac{\omega_j}{2\pi} $$.
If we take this to the continuum limit, summing over all frequencies,
$$ y(t) = \int_\infty^\infty Y(f) e^{-2\pi ift} df = \frac{1}{2\pi} \int_\infty^\infty Y(\frac{\omega}{2\pi}) ^{-i\omega t} d\omega \\ Y(f) = \int_\infty^\infty e^{2\pi i f t} dt$$The two expressions are equivalent, though the $\omega$ version is more commonly seen in the context of quantum mechanics. The forms above are commonly called the forward Fourier transforms, though this has no deep meaning. The idea of going 'forward' likely results from our far more pedestrian experience of time as a dimension than frequency; we transform 'forward' to a new domain, and then inverse transform to return to 'normal'. This reasoning holds for the other common Fourier transforms as well; consider that position space is far more familiar to us than momentum space!
Another tool that will be useful is the Dirac delta function, the result of transforming forward, then back, and expecting your original function to return (a reasonable hope):
$$ \int_\infty^\infty e^{i\omega(t - t')} d\omega = 2\pi \delta(t-t')\\ $$This function has the property of being zero everywhere except where t=t'. It has the following property when integrated with a function:
$$ \int y(t') \delta(t-t') dt' = y(t) $$It thus has the inverse units of your integration variable.
Consider discretizing the above definition of the Fourier transform:
$$ \text{For }t = m \Delta t \text{, and} f_n = n/(N\Delta t): \\ y_m = \sum_{n=0}^{N-1} Y_n e^{-2\pi imn/N} \qquad Y_n =\frac{1}{N} \sum_{m=0} y_m e^{2 \pi i m n / N} $$We can recreate the discrete version of a Dirac delta function with these definitions; the Kronecker delta function:
$$ \sum_{n=0}^{N-1}e^{2 \pi i n(m - m')/N} = N \delta_{m,m'} $$Note that $\Delta t$ has disappeared from our discrete FT. There are subtleties involving the amount of information contained in the DFT between real and complex cases.
The DFT as implented below loops through the time and frequency indices, m and n, to compute matrix elements $$\bra{n} M \ket{m}$$, where $M$ is the DFT Matrix. Since this matrix has $N\times N$ elements, the computation is of $\mathcal{O}(N^2)$, which will quickly become cumbersome. The Fast Fourier Transform is an algorithm much more efficient at computing the FT, which will be discussed in the next section.
The Nyquist critical frequency is defined as $$ f_c = \frac{1}{2\Delta t} $$ The simplest case to consider is a sine wave: to properly sample a sine wave, you must take at least 2 points per cycle. This means that you need to sample at a minimum of twice the maximum frequency of your data to preserve information below that frequency. In a real device, such as an experimental setup limited by the response of particular electrical cables, there will probably be natural frequency cutoffs. You need to sample at twice the max frequency you want. For example, human hearing cuts off at around 20kHz. To be able to properly sample audio recordings for future listening (on headphones, for instance), you need to sample at twice this frequency. It's no accident that audio CDs sample at 44.1kHz!
import numpy as np
import timeit
from matplotlib import pyplot as plt
from pfft import *
np.random.seed(2)
func = np.random.random(1000)
a = pdft.dft(func, vectorize=False)
b = pdft.dft(func, vectorize=True)
c = pdft.dft_no_matrix(func)
# Compare runtimes for these dfts to the fft
%timeit pdft.dft_no_matrix(func)
%timeit pdft.dft(func, vectorize=False)
%timeit pdft.dft(func, vectorize=True)
%timeit np.fft.fft(func)
1.1 s ± 14.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 3.9 s ± 36.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 43.4 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 14.7 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
"""To properly fit in the box with periodic boundary conditions,
we need the sine function to end at point N+1, not point N. Otherwise
we have two zeros next to each other, and there will be artefacts
present in the frequency spectrum. Specify your time step and choose
a compatible frequency. """
N = 2**10 # number of sample points
Tmax = 2 # s
#ts = np.linspace(0, 200/f, N)
#dt = np.diff(ts)[-1]
dt = Tmax/N
ts = np.arange(N) * dt
f = (20)/(Tmax) # frequency Hz
omega = 2 * np.pi * f # angular frequency
fs = np.linspace(0, 1/dt, N)
y0 = np.sin(omega * ts)
f_nyquist = 1 / 2 / np.diff(ts)[-1]
fs2 = pdft.dft_no_matrix(y0)
fs1 = pdft.dft(y0)
ys1 = pdft.idft(fs1)
f_nyquist, 1 / dt, f
(256.0, 512.0, 10.0)
plt.figure(figsize=(10,5))
plt.subplot(121)
plt.plot(ts, y0.real)
plt.plot(ts, y0.imag)
plt.xlabel('ts')
plt.ylabel('ys(t)')
plt.xlim(0, ts[-1])
plt.subplot(122)
plt.plot(ts, ys1.real)
plt.plot(ts, ys1.imag)
plt.xlabel('ts')
plt.ylabel('idft(dft(ys))')
plt.xlim(0, ts[-1])
(0, 1.998046875)
plt.plot(fs, pdft.dft(y0).real,'.')
plt.plot(fs, pdft.dft(y0).imag,'.')
plt.axvline(f, linewidth=0.5, linestyle='--')
<matplotlib.lines.Line2D at 0x7f318af164f0>
If the datatype is not set to complex, you will be discarding the phase of your DFT;
$$ e^{i\theta} = \cos{\theta} + i \sin{\theta} $$
With only half of the information, you will introduce phase shifts into the Fourier transform. This can be easily tested by changing the functions above to read dtype=float
.
plt.figure(figsize=(12,5))
plt.subplot(121)
plt.plot(fs,fs1.real)
plt.plot(fs,fs1.imag)
plt.axvline(f, linestyle='--')
plt.xlabel('frequency')
plt.ylabel('ft amplitude')
plt.subplot(122)
plt.plot(y0.real, ys1.real)
plt.xlabel('input function')
plt.ylabel('idft(dft(input))')
Text(0, 0.5, 'idft(dft(input))')
The Fast Fourier Transform (FFT) is a ubiquitous algorithm for computing the Fourier transform of a dataset in $\mathcal{O}(N \log N) $ time. Compare the difference in the number of operations needed for the two cases.
Ns = np.logspace(0,1.5,1000)
plt.plot(Ns, Ns**2)
plt.plot(Ns, Ns * np.log2(Ns))
plt.xlabel('Number of samples')
plt.ylabel('approx. calculations needed')
Text(0, 0.5, 'approx. calculations needed')
For each level, we can recursively split the evens and odds again and get twice as many terms:
$$ Y^e_n = Y^{ee}_n + w^{2n} Y^{eo}_n \\ Y^o_n = Y^{oe}_n + w^{2n} Y^{oo}_n \\ $$split until each $Y^{eoeoeoeoeoooeooeo...eoeooooeooooeeeeeoeoee}_n$, etc, term is a single element. For the n=8 example, this means three iterations of dividing the input array.
$$ Y^{ee}_n = Y^{eee}_{(0)} + w^{4n} Y^{eeo}_{(4)} \\ Y^{eo}_n = Y^{eoe}_{(2)} + w^{4n} Y^{eoo}_{(6)} \\ Y^{ee}_n = Y^{ooe}_{(3)} + w^{4n} Y^{ooo}_{(7)} \\ Y^{ee}_n = Y^{oee}_{(1)} + w^{4n} Y^{oeo}_{(5)} \\ $$Recall what these Y terms are. Each is a broken down component of the initial array $Y_n$, $n \in [0,8).$ A string like eeo, eoe, etc, represents a reversed binary representation of the index to $Y_n$. $e\rightarrow 0, o \rightarrow 1$. If you flip the text string, you recover the binary representation of the index. An example is essential here.
$$ Y^{eoo}: eoo \rightarrow 011; \text{reversed, } 110 \rightarrow Y_6 \\ Y^{eoo} = Y_6 $$So we can bit-reverse the index of the original array to find the new location of that element after the FFT algorithm begins working.
The point of bit reversing is a bookkeeping shortcut. Rather than splitting down the levels, sorting even/odd elements until they come out as a new array
$$[y_0, y_4, y_2, y_6, y_1, y_5, y_3, y_7]$$There is a simple pattern here that is not obvious at first glance. We can recognize that, if we convert the indices to binary, with the number of bits corresponding to $ln_2(len(y))$, we get
$$[y_{000}, y_{100}, y_{010}, y_{110}, y_{001}, y_{101}, y_{011}, y_{111}]$$And, if we reverse the order of each of these bits, the final result is
$$[y_{000}, y_{001}, y_{010}, y_{011}, y_{100}, y_{101}, y_{110}, y_{111}]$$a slick trick that can save us time by instantly swapping our input array elements to the place that sorting would have put them already.
Below is a sketch of the recursive fft algorithm. Note that this does not utilize the bit reversal technique; deriving this method led to the recognition of bit reversal as a bookkeeping trick, which is implemented in Iterative FFT below.
def _recursive_fft(data):
"""recursively compute the fft of the input data."""
N = int(len(data))
if N == 1:
return data # The FT of a single element is itelf
w_N = np.exp(-2j * np.pi / N)
w = 1.0 + 0.0j
data_even = data[0::2] # starts from 0 and moves two elements until N-2
data_odd = data[1::2] # starts from 1 and moves two elements until N-1
y_e = recursive_fft(data_even)
y_o = recursive_fft(data_odd)
y = np.zeros([N], dtype=complex)
for k in range(0, N//2):
t = w * y_o[k] # butterfly operation
y[k] = y_e[k] + t
y[k+N//2] = y_e[k] - t
w = w * w_N
del t
return y
# array slicing example
tes = np.linspace(0,31,32)
print(tes[:16][::2])
[ 0. 2. 4. 6. 8. 10. 12. 14.]
pfft.recursive_fft(data=y0)
array([8.39470213e-16+0.00000000e+00j, 1.08946670e-14-4.21946824e-15j, 2.04880249e-14+1.19737608e-14j, ..., 1.36039644e-14-3.03724287e-15j, 2.04880249e-14-1.19737608e-14j, 1.08946670e-14+4.21946824e-15j])
fft_fs = pfft.recursive_fft(y0)
fft_y0 = pfft.recursive_ifft(fft_fs)
plt.figure(figsize=(10,5))
ax1 = plt.subplot(121)
ax1.plot(ts, fft_y0.real, '-', label='rec. fft.r')
#ax1.plot(ts, fft_y0.imag, '-', label='rec. fft.im')
ax1.plot(ts, y0, linewidth=1., label='y0')
#plt.xlim(0, 5/f)
ax1.set(xlim=(0,4/f))
ax1.set(xlabel='time (s)', ylabel='ifft(fft) amplitude')
ax1.plot(ts, np.fft.ifft(fft_fs).real, '.', linewidth=0.5, label='npifft.r')
#ax1.plot(ts, np.fft.ifft(fft_fs).imag, '--', linewidth=0.3, label='npifft.im')
plt.legend()
ax2 = plt.subplot(122)
ax2.plot(fs, fft_fs.real)
ax2.plot(fs, fft_fs.imag)
ax2.axvline(f, linestyle='--', linewidth=0.5)
#plt.plot(fs, np.fft.fft(y0).real, '--')
#plt.plot(fs, np.fft.fft(y0).imag, '--')
#ax2.set(xlim=(fs[0], 10*f))
ax2.set(xlabel='frequency (Hz)', ylabel='fft amplitude')
ax2.set(xlim=(0, f_nyquist))
#ax2.axvline(fs[-1]/2)
#ax2.axvline(f_nyquist - f, linewidth=0.7, linestyle='-')
plt.tight_layout()
%timeit pfft.recursive_fft(y0)
%timeit np.fft.fft(y0)
8.09 ms ± 81.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 13.6 µs ± 117 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
The above recursive FFT is messy (I'm not sure where the extra frequency in the fft is coming from), and padding zeros is difficult without significantly more bookkeeping than I've done above. We will move from the bottom up in the tree describing the FFT, following the bit-reversal scheme.
def iterative_fft(cls, data):
data = data.astype(complex)
assert (np.log2(len(data)) % 1) == 0
A = cls.reverse_array(data)
N = int(len(A))
S = int(np.log2(N))
for s in range(1, S+1):
m = int(2**s) # length of split array at each level
w_m = np.exp(-2j * np.pi / m)
for k in range(0, N, m): # by m
w = 1.0 + 0.0j
m_2 = (m // 2)
for j in range(0, m_2):
t = w * A[k + j + m_2]
u = A[k + j]
A[k + j] = u + t
A[k + j + m_2] = u - t
w = w * w_m
return A
def bit_reverse(num, bits):
num = int(num)
return int('{:0{s}b}'.format(num, s=bits)[::-1], 2)
import numpy as np
import timeit
from matplotlib import pyplot as plt
from pfft import *
# Let's use the data from above again.
N = 2**10 # number of sample points
Tmax = 2 # s
#ts = np.linspace(0, 200/f, N)
#dt = np.diff(ts)[-1]
dt = Tmax/N
ts = np.arange(N) * dt
f = (200)/(Tmax) # frequency Hz
omega = 2 * np.pi * f # angular frequency
fs = np.linspace(0, 1/dt, N)
y0 = np.sin(omega * ts)
f_nyquist = 1 / 2 / np.diff(ts)[-1]
f_nyquist, 1 / dt, f
(256.0, 512.0, 100.0)
int('{:0{bits}b}'.format(12, bits=4)[::-1], 2) # base 2
3
# test
a = np.linspace(0, 1023, 1024, dtype=complex)
assert(np.allclose(a, pfft.reverse_array(pfft.reverse_array(a))))
freqs = pfft.iterative_fft(y0)
plt.plot(fs, freqs.real)
plt.plot(fs, freqs.imag)
#plt.xlim(0, f_nyquist)
plt.axvline(f, linewidth=0.5, linestyle='--')
#plt.axvline(f_nyquist-f, linewidth=0.5, linestyle='--')
plt.xlabel('frequency (Hz)')
plt.ylabel('iterative fft amplitude')
Text(0, 0.5, 'iterative fft amplitude')
plt.figure(figsize=(15,5))
plt.plot(ts, pfft.iterative_ifft(freqs).real)
plt.plot(ts, pfft.iterative_ifft(freqs).imag)
plt.plot(ts, y0, '--', linewidth=0.5)
[<matplotlib.lines.Line2D at 0x7f318a413dc0>]
plt.figure(figsize=(15,5))
plt.plot(ts, y0)
[<matplotlib.lines.Line2D at 0x7f318a3f0a60>]
assert(np.allclose(np.fft.fft(y0), pfft.iterative_fft(y0)))
assert(np.allclose(np.fft.ifft(y0), pfft.iterative_ifft(y0)))
%timeit pdft.dft_no_matrix(y0)
%timeit pdft.dft(y0, vectorize=False)
%timeit pdft.dft(y0)
%timeit pfft.recursive_fft(y0)
%timeit pfft.iterative_fft(y0)
%timeit np.fft.fft(y0)
1.29 s ± 77.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 4.41 s ± 168 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 57.7 ms ± 382 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 7.95 ms ± 66.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 6.34 ms ± 29.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 13.6 µs ± 41.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
For this simple problem, poorly implemented, recursive_fft()
runs 38 times faster than my $\mathcal{O}(N^2)$ DFT routine, while numpy's far superior np.fft.fft
runs 2292 times faster than recursive_fft()
.
1000**2 / (1000 * np.log2(1000))
100.34333188799373
The interval over which to sample and the number of samples are the parameters most closely in our control. It was briefly mentioned above that discarding the imaginary part of the fft destroys the phase information, but this may have been obfuscated by the development of the crude fft. Let's see how a phase shift in a pure $\sin{\omega t}$ signal affects the fourier transform.
1/np.diff(ts)[-1]
512.0
Good frequencies are $$ f = \frac{n}{T} \bigg \rvert_{-N/2}^{N/2 - 1} \\ \omega = \frac{2\pi n}{T} \bigg \rvert_{-N/2}^{N/2 - 1} \\ k = \frac{2\pi n}{L} \bigg \rvert_{-N/2}^{N/2 - 1} \\ $$
N = 2**10
Tmax = 10
ts = np.linspace(0, 10, N)
dt = np.diff(ts)[-1]
f = (160)/(Tmax+dt) # Hz
f_nyquist = 1 / 2 / dt
assert(f <= f_nyquist)
fs = np.linspace(0, 1/np.diff(ts)[-1], N)
omega = 2 * np.pi * f # rad/s
# no phase shift series
y_t = np.sin(omega * ts)
Y_f = np.fft.fft(y_t)
# phase shift series
y_t1 = np.sin(omega * ts - np.pi/4)
Y_f1 = np.fft.fft(y_t1)
# the data below is deliberately not carefully chosen,
# so that its frequencies don't match up nicely with
# the sampling frequency above
y_t2 = (1 * np.sin(2 * np.pi * 2.5 * ts - np.pi/4) +
1 * np.sin(2 * np.pi * 4.5 * ts + 0)) # superposition of two sines
Y_f2 = np.fft.fft(y_t2)
print('f_nyquist={:4g} Hz, f={:4g} Hz'.format(f_nyquist, f))
plt.plot(ts, y_t)
plt.xlim(0, 3/f)
f_nyquist=51.15 Hz, f=15.9844 Hz
(0, 0.187683284457478)
plt.figure(figsize=(12,7))
plt.subplot(221)
plt.plot(ts, y_t)
plt.xlim(0, 3/f)
plt.subplot(222)
plt.plot(fs, Y_f.real, '.', label='fft(y_t).real')
plt.plot(fs, Y_f.imag, '.', label='fft(y_t).imag')
plt.legend()
plt.ylabel('fft amplitude')
plt.xlabel('frequency (Hz)')
plt.axvline(f, linestyle='--', linewidth=0.5, color='r')
#plt.xlim(0,fs[-1]/4)
plt.subplot(223)
plt.plot(ts, y_t1)
plt.xlim(0,3/f)
plt.subplot(224)
plt.plot(fs, Y_f1.real, '.', label='fft(y_t1).real')
plt.plot(fs, Y_f1.imag, '.', label='fft(y_t1).imag')
plt.legend()
plt.ylabel('fft amplitude')
plt.xlabel('frequency (Hz)')
plt.axvline(f, linestyle='--', linewidth=0.5, color='r')
#plt.xlim(0,fs[-1]/4)
<matplotlib.lines.Line2D at 0x7f317907bd60>
From the first row of plots, you should be able to convince yourself that there should only be an imaginary piece to the frequency spectrum. From euler's equation, $$e^{i\omega t} = \cos{\omega t} + i \sin{\omega t},$$ the FT depends on a sine and a cosine contribution, but our initial signal was $\sin{\omega t}$. The cosine amplitude should then be zero.
In the second pair of plots, a phase shift was applied. The phase shift can be deconstructed into a relative weighing of a sine and a cosine piece, with relative amplitudes. This is what we see; there is now a real and an imaginary amplitude after the FFT, both at the same frequency but with different amplitude. This was the cause of the loss of phase and amplitude earlier; without specifying that the datatype was complex, Python dumped the complex part of the arrays to cast to real floats, and we lost our phase and amplitude information.
In the plot below, frequencies incommensurate with the sampling frequency were picked (and deliberately chosen to be primes, though for no specific reason). Even though only two frequencies (5Hz and 7Hz) are present in the signal data, there are nonzero amplitudes at multiple frequencies, with the maximum amplitude being where you'd expect them; i.e., at the given frequencies in the summed sine waveforms.
The frequency spectrum has twice as many peaks as you'd expect; the components $Y_{N/2-n} = Y^*_{N/2+n}$. Since $ N/2 \rightarrow F_{\text{nyquist}} $, the duplicated frequencies will be rotated about the Nyquist frequency.
plt.figure(figsize=(12,4))
plt.subplot(121)
plt.plot(fs, Y_f2.real, '.', label='fft(y_f2).real')
plt.plot(fs, Y_f2.imag, '.', label='fft(y_f2).imag')
plt.legend()
plt.ylabel('fft amplitude')
plt.xlabel('frequency (Hz)')
#plt.xlim(0,fs[-1]/2)
plt.axvline(f_nyquist, linestyle='--', color='r', linewidth=0.5)
plt.subplot(122)
plt.plot(fs, Y_f2.real, '.', label='fft(y_f2).real')
plt.plot(fs, Y_f2.imag, '.', label='fft(y_f2).imag')
plt.legend()
plt.ylabel('fft amplitude')
plt.xlabel('frequency (Hz)')
plt.xlim(0,10)
plt.axvline(2.5, linestyle='--', color='r',linewidth=0.5)
plt.axvline(4.5, linestyle='--', color='r', linewidth=0.5)
<matplotlib.lines.Line2D at 0x7f3178ed0730>
The above discussion only used frequencies below the Nyquist frequency. If we sample above this, we run into problems.
Let's look at a signal of 10Hz, with the sampling chosen so that we have a nyquist frequency of 6Hz.
f_nyquist = 3 #Hz
dt = 1 / f_nyquist / 2
N = 100
ts = np.linspace(0,N*dt,N)
y_t = np.sin(2*np.pi*5*ts)
fs = np.linspace(0,1/dt,N)
plt.plot(fs, np.fft.fft(y_t).real, '.')
plt.plot(fs, np.fft.fft(y_t).imag, '.')
#plt.xlim(0,f_nyquist)
plt.xlabel('frequency (Hz)')
plt.ylabel('Y_f Amplitude')
plt.title('Folded-Back Frequency')
Text(0.5, 1.0, 'Folded-Back Frequency')
The doubled fft frequency spectrum contains what looks like a 10Hz signal, but if we look at the domain [0,f_nyq] it would appear that there is a signal at 2Hz. If we were handed this spectrum blind, we would have no way of knowing if the 'real' frequency of the input signal was the 2Hz or 10Hz peak. This is aliasing.
The full domain is from 0 to the sampling frequency, and the doubled signals are obtained by folding the plot from the center. The difference between the input frequency and the nyquist frequency is $$ f_0 - f_{nq} = 10Hz - 6Hz = 4Hz $$
If we move that distance from $f_{nq}$ again, we'll find our aliased frequency.
$$ f_{nq} - (f_0 - f_{nq}) = 2f_{nq} - f_0 = 2Hz $$This is what we see in the plot above. Let's look at the time domain.
plt.plot(ts, np.fft.ifft(np.fft.fft(y_t)).real, '.', label='ifft(fft(ys))')
ts1 = np.linspace(0, N*dt, 100*N)
plt.plot(ts1, np.sin(2*np.pi*5*ts1), label='ys, higher sample rate')
plt.plot(ts1, np.sin(2*np.pi*1*ts1-np.pi*7/6),'--')
plt.legend()
plt.xlabel('time (s)')
plt.ylabel('amplitude y(t)')
plt.xlim(0,2.5)
(0, 2.5)
The fourier transformed data comes back with an aparrent frequency far lower than the input frequency, since we sampled lower than half the data's frequency. In handling real data one would want to either have a way of estimating the frequency before sampling, or taking as high a sample rate as feasible, before analysing the data with signal processing methods.
We can use the Fourier transform to take derivatives of analytic, periodic functions.
def deriv(f, dx=1, d=1):
"""take the dth derivative of f,
with windowing dx."""
k = 2 * np.pi * np.fft.fftfreq(len(f), dx)
return np.fft.ifft((1j*k)**d * np.fft.fft(f))
N = 2**5
L = 10
_dx = L/N
_xs = np.arange(N) * _dx
print('_f_nyq = {fnyq:4g}'.format(fnyq= 1 / 2 / _dx))
print('sample rate = {k:4g}'.format(k=2*np.pi/L))
__ks = 2 * np.pi * np.fft.fftfreq(N, _dx)
phi = 2 * np.pi * _xs / L
_ys = np.exp(np.cos(phi))
_dys = deriv(f=_ys, dx=_dx, d=1)
_ddys = deriv(f=_ys, dx=_dx, d=2)
a_dys = -2 * np.pi / L * _ys * np.sin(phi)
a_ddys = (2 * np.pi / L)**2 * _ys * (np.sin(phi)**2 - np.cos(phi))
plt.figure(figsize=(10,5))
dp1 = plt.subplot(121)
dp1.plot(_xs, _ys, label='f')
dp1.plot(_xs, _dys, label='df')
dp1.plot(_xs, _ddys, label='ddf')
dp1.legend()
dp2 = plt.subplot(122)
dp2.plot(_xs, a_dys - _dys)
dp2.plot(_xs, a_ddys - _ddys)
plt.tight_layout()
_f_nyq = 1.6 sample rate = 0.628319
/home/ryan/apps/miniconda/envs/work/lib/python3.8/site-packages/numpy/core/_asarray.py:85: ComplexWarning: Casting complex values to real discards the imaginary part return array(a, dtype, copy=False, order=order) /home/ryan/apps/miniconda/envs/work/lib/python3.8/site-packages/numpy/core/_asarray.py:85: ComplexWarning: Casting complex values to real discards the imaginary part return array(a, dtype, copy=False, order=order) /home/ryan/apps/miniconda/envs/work/lib/python3.8/site-packages/numpy/core/_asarray.py:85: ComplexWarning: Casting complex values to real discards the imaginary part return array(a, dtype, copy=False, order=order) /home/ryan/apps/miniconda/envs/work/lib/python3.8/site-packages/numpy/core/_asarray.py:85: ComplexWarning: Casting complex values to real discards the imaginary part return array(a, dtype, copy=False, order=order)
#_ys2 = np.array(list(map(lambda x: np.exp(np.cos(2 * np.pi * x / L)), _xs)))
#assert(np.allclose(_ys, _ys2))
_k = 2 * np.pi / L
_ks = np.append(np.arange(0, N/2), np.arange(-N/2, 0)) * _k
plt.plot(_ks)
plt.plot(__ks)
np.allclose(_ks, __ks)
# This, just to make sure of what np.fft.fftfreqs does
True
Errors come in two flavors from physics lingo - UV and IR.
A wave cannot fit in a box with a frequency that will touch less than two points. I.E., the smallest wavelength that can fit in your box is 2*dx. Imagine an 8 point lattice with a standing wave of this wavelength:
$$ \text{(1) } \uparrow \downarrow \uparrow \downarrow \uparrow \downarrow \uparrow \downarrow \\ \text{(2) } \uparrow \uparrow \uparrow \uparrow \downarrow \downarrow \downarrow \downarrow $$Case (1) has the smallest wavelength possible. Anything less than this and we can't represent it accurately; this will cause aliasing, and the resultant aliased wavelength will be larger than you want (lower energy). See Numerical Recipes for a discussion of aliasing as energy bleeding out of the edge of the PSD power spectrum.
These are called ultraviolet errors because the low wavelengths correspond to high energy waves (particles); thus the errors are in the ultraviolet regime on the electromagnetic scale.
Likewise, a wave with too large a wavelength will begin to lose features. If the wavelength is larger than case (2) above, you may end up with IR errors. We could fit a half wavelength into a box, but our periodic boundary conditions will make a discontinuity appear (consider what happened when there was just one extra zero in a Fourier transformed data set from improperly picking boundary conditions!)
Consider a Gaussian in a centered box:
$$ f(x) = e^{-x^2 / 2 \sigma^2}, \qquad f'(x) = \frac{-x}{\sigma^2}f(x), \qquad f(k) = \sqrt{2\pi} \sigma e^{-k^2 \sigma^2/2} $$We want the function to achieve machine precision. In this case, that means approaching $\epsilon = 2^{-52} \approx 2 \times 10^{-16}$.
The full function must fit in the box (IR error).
The frequency must not alias out (UV error).
# gaussian in a centered box
N = 2**10
L = 10
dx = L/N
xs = np.arange(N) * dx - L / 2
sigma = 1.0
k = 2 * np.pi * np.fft.fftfreq(N, dx)
f = np.exp(-xs**2 / 2 / sigma**2)
dfft = deriv(f=f, dx=dx, d=1)
df = - xs / sigma**2 * f
error = df - dfft
plt.figure(figsize=(10,5))
plt.subplot(121)
plt.plot(xs, f)
plt.plot(xs, df)
plt.subplot(122)
plt.plot(xs, error)
/home/ryan/apps/miniconda/envs/work/lib/python3.8/site-packages/numpy/core/_asarray.py:85: ComplexWarning: Casting complex values to real discards the imaginary part return array(a, dtype, copy=False, order=order)
[<matplotlib.lines.Line2D at 0x7f3170d7cd30>]
# Let's look at the error over a range of values,
# and shoot for the values that get us close to machine precision.
# Use our rough estimate for L_opt
eps = np.finfo(float).eps
L = np.sqrt(-8 * np.log(eps)) * sigma
N_opt = int(np.ceil((L / np.pi / sigma) * np.sqrt(
-2 * np.log(eps / np.sqrt(2 * np.pi * sigma**2)))))
print(eps, '\n', L,'sigma', '\n', N_opt)
Ns = np.arange(4,64)
Ls = np.linspace(L/2, 2*L, 100)
def get_error(N, L, sigma):
"""return the error for a periodic function,
given its sampling and analytic derivative"""
dx = L/N
xs = np.arange(N) * dx - L / 2
k = 2 * np.pi * np.fft.fftfreq(N, dx)
f = np.array(list(map(lambda x: np.exp(-x**2 / 2 / sigma**2), xs)))
df = np.array(list(map(lambda x: -x / sigma**2 * (
np.exp(-x**2 / 2 / sigma**2)), xs)))
dfft = deriv(f=f, dx=dx, d=1)
error = df - dfft
return abs(error/abs(df).max()).max()
errs_N = [get_error(N=_N, L=L, sigma=sigma) for _N in Ns]
errs_L = [get_error(N=N_opt, L=_L, sigma=sigma) for _L in Ls]
plt.figure(figsize=(10,5))
plt.subplot(121)
plt.semilogy(Ns, errs_N)
plt.subplot(122)
plt.semilogy(Ls, errs_L)
2.220446049250313e-16 16.980848833699017 sigma 47
[<matplotlib.lines.Line2D at 0x7f316e3d6e80>]
This estimation scheme for optimal values of N and L is great if we have an analytic form to play with, but what if the function is too complicated for this? We have a few options:
1.3/0.0667
19.49025487256372
ts = np.linspace(0,10, 2**10)
a = np.exp(-(ts-5)**2)
b = np.zeros(len(ts))
for i in range(len(ts)):
if 0.5 < ts[i] < 1.5: b[i] = 0.5
else: b[i] = 0
plt.plot(ts, a)
plt.plot(ts, b)
c = 0.01 * np.convolve(a, b, mode='same')
plt.plot(ts, c)
plt.xlabel('ts')
plt.ylabel('y(ts)')
plt.tight_layout()
# compare to manually fourier transforming, then transforming back
plt.plot(ts, 0.01 * np.correlate(a,b, mode='same'))
c2 = 0.01 * np.fft.ifft(np.fft.fft(a)*np.fft.fft(b))
plt.plot(ts, c2)
/home/ryan/apps/miniconda/envs/work/lib/python3.8/site-packages/numpy/core/_asarray.py:85: ComplexWarning: Casting complex values to real discards the imaginary part return array(a, dtype, copy=False, order=order)
[<matplotlib.lines.Line2D at 0x7f318a0e9730>]
from scipy.signal import periodogram
ts = np.linspace(0, 2*np.pi, 2*10)
dt = np.diff(ts)[-1]
fs = np.linspace(0, 1/dt, len(ts))
a = np.sin(2*np.pi*1/dt/3 * ts) + 0.7*np.sin(2*np.pi*1/dt/5.4 * ts + np.pi/9)
print(np.diff(fs)[-1], fs[-1]/3)
freqs, Pxx = periodogram(a, 1/dt)
plt.plot(freqs, Pxx)
plt.axvline(fs[-1]/3)
corr = np.correlate(a,a, mode='same')
power = 0.001*np.fft.fft(corr).real[:]**2 + np.fft.fft(corr).imag[:]**2
plt.plot(fs, power)
plt.xlim(0,fs[-1]/2)
0.15915494309189526 1.0079813062486702
(0, 1.5119719593730054)
We will need to integrate a function to obtain the correlation between it and itself, lagged. The correlation is defined as $$ Corr[y](\tau) = \int_{\infty}^{\infty} y(t)*y(t + \tau)d\tau $$
def integrate(func, ts):
"""integrate an array using Newton-Cotes trapezoid method. """
dt = (ts[-1] - ts[0]) / len(ts)
if hasattr(func, '__len__') and (not isinstance(func, str)):
data = func
else:
data = func(ts)
return (sum(data[1:-1]) + (data[-1] + data[0])/2) * dt
ts = np.linspace(0,np.pi,10000)
dt = ((ts[-1]-ts[0])/len(ts))
def y_t(t):
return np.sin(t)
I = integrate(y_t, ts)
error = (2 - I) / I
I, error, dt
(1.9997999835490183, 0.00010001822813634565, 0.0003141592653589793)
Correlation: $$ Corr[A, B][n] = \sum_{m = 0}^{m= N-1} A[m]B[n-m] $$
np.correlate(a, b)
def correlation(a, b):
"""return the correlation of two arrays."""
Cab = 0 * a
N = len(a)
for i in range(0,N):
for j in range(0,N):
Cab[i] += a[j].conj() * b[(i + j)%N]
return Cab
correlation(a,b)
_ffta, _fftb = np.fft.fft(a), np.fft.fft(b)
_fftab = np.fft.fft(a.conj() * b)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-43-2aac3bdf681d> in <module> 12 13 _ffta, _fftb = np.fft.fft(a), np.fft.fft(b) ---> 14 _fftab = np.fft.fft(a.conj() * b) ValueError: operands could not be broadcast together with shapes (20,) (1024,)
from scipy.signal import fftconvolve
assert np.allclose(np.convolve(a, b), fftconvolve(a,b))
%timeit np.convolve(a,b)
%timeit fftconvolve(a,b)
plt.plot(np.convolve(a,b))
#plt.plot(correlation(a,b))
#plt.plot(np.correlate(a,b,'same'))
Parseval's Theorem relates the total energy in the time domain to the total energy in the frequency domain. The Fourier transform takes you from one basis to another, but the physical quantities associated with the system cannot change. Consider this in the continuous and discrete cases:
$$ \int dt e^{-i\omega t} y(t) = Y(\omega) \qquad \frac{1}{2\pi} \int d\omega e^{i \omega t} Y(\omega) = y(t) \\ \sum_{n=0}^{N-1} e^{-i \omega n / N} y_n = Y_m \qquad \frac{1}{N} \sum_{m=0}^{N-1} e^{i \omega m / N} Y_m = y_n \ $$Parseval's Theorem states that:
$$ \boxed{ \int dt \abs{y(t)}^2 = \frac{1}{2\pi}\int d\omega \abs{Y({\omega})}^2 \\ \sum_{n=0}^{N-1} \abs{y_n}^2 = \frac{1}{N} \sum_{n=0}^{N-1} \abs{y_m}^2 } $$assert(np.allclose(sum(np.square(np.abs(y0))), sum(np.square(np.abs(np.fft.fft(y0))))/len(y0)))
N = 32
L = 10
dx = L/N
x = np.arange(N) * dx - L/2
ks = 2 * np.pi * np.fft.fftfreq(N, dx)
def grad(f):
return np.fft.ifft((1j*k) * np.fft.fft(f))
def laplacian(f):
return np.fft.ifft((1j*k)**2 * np.fft.fft(f))
def grad_dot_grad(a, b):
return (laplacian(a*b) - a*laplacian(b) - b*laplacian(a))/2
# make two functions, a and b
a = np.sin(ks[4] * x)
b = np.exp(np.cos(ks[15] * x))
plt.plot(x, a)
plt.plot(x, b)
np.allclose(grad(a*b), a*grad(b) + b*grad(a))
False
Why does this fail? The set of frequencies accessible to a and b are obtained with np.fftfreq()
, but a*b has a different set of frequencies. Borrowing from a discussion by MMF, the dft has frequencies
If we multiply our two functions $a_x$ and $b_x$, the spectrum now looks like
$$ a_x b_x = \frac{1}{N^2} \sum_{k_a, k_b} e^{i(k_a+k_b)x}A_{k_1}B_{k_2} = \frac{1}{N}\sum_p e^{ipx}C_p $$In other words, while the analytic derivative satisfies the product rule as used above, the FFT derivative will not be able to 'see' the frequencies the new function have outside the old frequency basis.
N = 2**10
ts = np.linspace(0, 3*np.pi, N)
xs = np.zeros(N)
dt = np.diff(ts)[-1]
fs = np.linspace(0, 1/dt, N)
for i in range(0,16):
xs += 0.9**i * np.sin(2 * np.pi * 2**i * ts)
ps = np.fft.fft(xs)
for i in range(N):
if fs[i] > 2:
ps[i] = 0.0
fft_xs = np.fft.ifft(ps)
plt.figure(figsize=(10,5))
plt.subplot(121)
plt.plot(ts, xs)
plt.plot(ts, fft_xs.real)
plt.xlim(0,3)
freqs, pows = periodogram(xs, 1/dt)
freqs2, pows2 = periodogram(fft_xs.real, 1/dt)
plt.subplot(122)
plt.plot(freqs, pows)
plt.plot(freqs2, pows2)
plt.xlim(0,freqs[-1])
(0, 54.271835594334654)
def integrate(func, ts):
"""integrate an array using Newton-Cotes trapezoid method. """
import types
if type(func) == types.FunctionType:
data = func(ts)
elif hasattr(func, '__len__') and (not isinstance(func, str)):
data = func
else:
raise ValueError('Data provided is not a function or an array.')
dt = (ts[-1] - ts[0]) / len(ts)
return (sum(data[1:-1]) + (data[-1] + data[0])/2) * dt